Clusterability Detection and Cluster Initialization
نویسندگان
چکیده
The need for a preliminary assessment of the clustering tendency or clusterability of massive data sets is known. A good clusterability detection method should serve to influence a decision as to whether to cluster at all, as well as provide useful seed input to a chosen clustering algorithm. We present a framework for the definition of the clusterability of a data set from a distance-based perspective. We discuss a graph-based system for detecting clusterability and generating seed information including an estimate of the value of k – the number of clusters in the data set, an input parameter to many distancebased clustering methods. The output of our method is tunable to accommodate a wide variety of clustering methods. We have conducted a number of experiments using our methodology with stock market data and with the well-known BIRCH data sets, in two as well as higher dimensions. Based on our experiments and results we find that our methodology can serve as the basis for much future work in this area. We report our results and discuss promising future directions.
منابع مشابه
Clusterability Detection and Initial Seed Selection in Large Data Sets
The need for a preliminary assessment of the clustering tendency or clusterability of massive data sets is known. A good clusterability detection method should serve to in uence a decision as to whether to cluster at all, as well as provide useful seed input to a chosen clustering algorithm. We present a framework for the de nition of the clusterability of a data set from a distance-based persp...
متن کاملClustering Oligarchies
We investigate the extent to which clustering algorithms are robust to the addition of a small, potentially adversarial, set of points. Our analysis reveals radical differences in the robustness of popular clustering methods. k-means and several related techniques are robust when data is clusterable, and we provide a quantitative analysis capturing the precise relationship between clusterabilit...
متن کاملWhich Data Sets are ‘Clusterable’? – A Theoretical Study of Clusterability
We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation model. We survey several notions of clusterability that have been discussed in th...
متن کاملAn Effective and Efficient Approach for Clusterability Evaluation
Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them...
متن کاملClusterability: A Theoretical Study
We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation model. We survey several notions of clusterability that have been discussed in th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000